<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" data-whc_version="25.0">
    <head><link rel="shortcut icon" href="../../../oxygen-webhelp/template/images/favicon.png"/><link rel="icon" href="../../../oxygen-webhelp/template/images/favicon.png"/><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><meta http-equiv="X-UA-Compatible" content="IE=edge"/><meta name="copyright" content="(C) Copyright 2024"/><meta name="generator" content="DITA-OT"/><meta name="description" content="This solution describes how to convert Avro files to the columnar format, Parquet. Let's say that you want to store data on HDFS using the columnar format, Parquet. But Data Collector doesn't have a ..."/><meta name="prodname" content="Data Collector"/><meta name="version" content="3"/><meta name="release" content="16"/><meta name="modification" content="0"/>        
      <title>Converting Data to the Parquet Data Format</title><!--  Generated with Oxygen version 25.1, build number 2023042410.  --><meta name="wh-path2root" content="../../../"/><meta name="wh-toc-id" content="concept_jkm_rnz_kx-d16893e60178"/><meta name="wh-source-relpath" content="datacollector/UserGuide/Solutions/Parquet.dita"/><meta name="wh-out-relpath" content="datacollector/UserGuide/Solutions/Parquet.html"/>

    <link rel="stylesheet" type="text/css" href="../../../oxygen-webhelp/app/commons.css?buildId=2023042410"/>
    <link rel="stylesheet" type="text/css" href="../../../oxygen-webhelp/app/topic.css?buildId=2023042410"/>

    <script src="../../../oxygen-webhelp/app/options/properties.js?buildId=20240802104629"></script>
    <script src="../../../oxygen-webhelp/app/localization/strings.js?buildId=2023042410"></script>
    <script src="../../../oxygen-webhelp/app/search/index/keywords.js?buildId=20240802104629"></script>
    <script defer="defer" src="../../../oxygen-webhelp/app/commons.js?buildId=2023042410"></script>
    <script defer="defer" src="../../../oxygen-webhelp/app/topic.js?buildId=2023042410"></script>
<link rel="stylesheet" type="text/css" href="../../../oxygen-webhelp/template/light.css?buildId=2023042410"/><link rel="stylesheet" type="text/css" href="../../../skin.css"/></head>

    <body class="wh_topic_page frmBody">
        
        
        

        
<nav class="navbar navbar-default wh_header" data-whc_version="25.0">
    <div class="container-fluid">
        <div class="wh_header_flex_container navbar-nav navbar-expand-md navbar-dark">
            <div class="wh_logo_and_publication_title_container">
                <div class="wh_logo_and_publication_title">
                    
                    <!--
                            This component will be generated when the next parameters are specified in the transformation scenario:
                            'webhelp.logo.image' and 'webhelp.logo.image.target.url'.
                            See: http://oxygenxml.com/doc/versions/17.1/ug-editor/#topics/dita_webhelp_output.html.
                    -->
                    
                    <div class=" wh_publication_title "><a href="../../../index.html"><span class="booktitle">  <span class="ph mainbooktitle"><span class="ph">Data Collector</span> User Guide</span>  </span></a></div>
                    
                </div>
                
                <!-- The menu button for mobile devices is copied in the output only when the 'webhelp.show.top.menu' parameter is set to 'yes' -->
                
            </div>

            <div class="wh_top_menu_and_indexterms_link collapse navbar-collapse">
                
                
                <div class=" wh_indexterms_link "><a href="../../../indexTerms.html" title="Index" aria-label="Go to index terms page"><span>Index</span></a></div>
                
            </div>
        </div>
    </div>
</nav>

        <div class=" wh_search_input navbar-form wh_topic_page_search search " role="form">


<form id="searchForm" method="get" role="search" action="../../../search.html"><div><input type="search" placeholder="Search " class="wh_search_textfield" id="textToSearch" name="searchQuery" aria-label="Search query" required="required"/><button type="submit" class="wh_search_button" aria-label="Search"><span class="search_input_text">Search</span></button></div></form>

</div>
        
        <div class="container-fluid">
            <div class="row">

                <nav class="wh_tools d-print-none">
                    
<div data-tooltip-position="bottom" class=" wh_breadcrumb "><ol class="d-print-none"><li><span class="home"><a href="../../../index.html"><span>Home</span></a></span></li><li><div class="topicref" data-id="concept_zq5_pb4_flb"><div class="title"><a href="../../../datacollector/UserGuide/Solutions/Solutions-title.html">Solutions</a></div></div></li><li class="active"><div class="topicref" data-id="concept_jkm_rnz_kx"><div class="title"><a href="../../../datacollector/UserGuide/Solutions/Parquet.html#concept_jkm_rnz_kx">Converting Data to the Parquet Data Format</a><div class="wh-tooltip"><p class="shortdesc"></p></div></div></div></li></ol></div>



                    <div class="wh_right_tools "><button class="wh_hide_highlight" aria-label="Toggle search highlights" title="Toggle search highlights"></button><button class="webhelp_expand_collapse_sections" data-next-state="collapsed" aria-label="Collapse sections" title="Collapse sections"></button><div class=" wh_navigation_links "><span id="topic_navigation_links" class="navheader">
  
<span class="navprev"><a class="- topic/link link" href="../../../datacollector/UserGuide/Solutions/Overview.html#concept_aw1_p1q_plb" title="Solutions Overview" aria-label="Previous topic: Solutions Overview" rel="prev"></a></span>  
<span class="navnext"><a class="- topic/link link" href="../../../datacollector/UserGuide/Solutions/Impala.html#concept_szz_xwm_lx" title="Automating Impala Metadata Updates for Drift Synchronization for Hive" aria-label="Next topic: Automating Impala Metadata Updates for Drift Synchronization for Hive" rel="next"></a></span>  </span></div>
<!--External resource link-->
<div class=" wh_print_link print d-none d-md-inline-block "><button onClick="window.print()" title="Print this page" aria-label="Print this page"></button></div>
                        
                        
                        
                        
                    </div>
                </nav>
            </div>

            

<div class="wh_content_area">
                <div class="row">
                    


                        <nav role="navigation" id="wh_publication_toc" class="col-lg-3 col-md-3 col-sm-12 d-md-block d-none d-print-none">
<div id="wh_publication_toc_content">


                            <div class=" wh_publication_toc " data-tooltip-position="right"><span class="expand-button-action-labels"><span id="button-expand-action" role="button" aria-label="Expand"></span><span id="button-collapse-action" role="button" aria-label="Collapse"></span><span id="button-pending-action" role="button" aria-label="Pending"></span></span><ul role="tree" aria-label="Table of Contents"><li role="treeitem" aria-expanded="false"><div data-tocid="concept_htw_ghg_jq-d16893e53" class="topicref" data-id="concept_htw_ghg_jq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_htw_ghg_jq-d16893e53-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Getting_Started/GettingStarted_Title.html#concept_htw_ghg_jq" id="concept_htw_ghg_jq-d16893e53-link">Getting Started</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_l2v_nlp_mpb-d16893e331" class="topicref" data-id="concept_l2v_nlp_mpb" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_l2v_nlp_mpb-d16893e331-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/ReleaseNotes/ReleaseNotes.html#concept_l2v_nlp_mpb" id="concept_l2v_nlp_mpb-d16893e331-link">Release Notes</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_l4q_flb_kr-d16893e2582" class="topicref" data-id="concept_l4q_flb_kr" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_l4q_flb_kr-d16893e2582-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Installation/Install_title.html" id="concept_l4q_flb_kr-d16893e2582-link">Installation</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_ylh_yyz_ky-d16893e3984" class="topicref" data-id="concept_ylh_yyz_ky" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_ylh_yyz_ky-d16893e3984-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Configuration/Config_title.html" id="concept_ylh_yyz_ky-d16893e3984-link">Configuration</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_ejk_f1f_5v-d16893e7058" class="topicref" data-id="concept_ejk_f1f_5v" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_ejk_f1f_5v-d16893e7058-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Upgrade/Upgrade_title.html" id="concept_ejk_f1f_5v-d16893e7058-link">Upgrade</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_qsw_cjy_bt-d16893e10103" class="topicref" data-id="concept_qsw_cjy_bt" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_qsw_cjy_bt-d16893e10103-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Pipeline_Design/PipelineDesign_title.html" id="concept_qsw_cjy_bt-d16893e10103-link">Pipeline Concepts and Design</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_qn1_wn4_kq-d16893e11199" class="topicref" data-id="concept_qn1_wn4_kq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_qn1_wn4_kq-d16893e11199-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Pipeline_Configuration/PipelineConfiguration_title.html" id="concept_qn1_wn4_kq-d16893e11199-link">Pipeline Configuration</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_hdr_gyw_41b-d16893e13057" class="topicref" data-id="concept_hdr_gyw_41b" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_hdr_gyw_41b-d16893e13057-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Data_Formats/DataFormats-Title.html" id="concept_hdr_gyw_41b-d16893e13057-link">Data Formats</a><div class="wh-tooltip"><p class="shortdesc"></p></div></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_yjl_nc5_jq-d16893e14164" class="topicref" data-id="concept_yjl_nc5_jq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_yjl_nc5_jq-d16893e14164-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Origins/Origins_title.html" id="concept_yjl_nc5_jq-d16893e14164-link">Origins</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_yjl_nc5_jq-d16893e35197" class="topicref" data-id="concept_yjl_nc5_jq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_yjl_nc5_jq-d16893e35197-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Processors/Processors_title.html" id="concept_yjl_nc5_jq-d16893e35197-link">Processors</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_agj_cfj_br-d16893e44037" class="topicref" data-id="concept_agj_cfj_br" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_agj_cfj_br-d16893e44037-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Destinations/Destinations-title.html" id="concept_agj_cfj_br-d16893e44037-link">Destinations</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_umc_1lk_fx-d16893e56072" class="topicref" data-id="concept_umc_1lk_fx" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_umc_1lk_fx-d16893e56072-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Executors/Executors-title.html" id="concept_umc_1lk_fx-d16893e56072-link">Executors</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_xxd_f5r_kx-d16893e59696" class="topicref" data-id="concept_xxd_f5r_kx" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_xxd_f5r_kx-d16893e59696-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Event_Handling/EventFramework-Title.html#concept_xxd_f5r_kx" id="concept_xxd_f5r_kx-d16893e59696-link">Dataflow Triggers</a></div></div></li><li role="treeitem" aria-expanded="true"><div data-tocid="concept_zq5_pb4_flb-d16893e60134" class="topicref" data-id="concept_zq5_pb4_flb" data-state="expanded"><span role="button" tabindex="0" aria-labelledby="button-collapse-action concept_zq5_pb4_flb-d16893e60134-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/Solutions-title.html" id="concept_zq5_pb4_flb-d16893e60134-link">Solutions</a></div></div><ul role="group" class="navbar-nav nav-list"><li role="treeitem"><div data-tocid="concept_aw1_p1q_plb-d16893e60156" class="topicref" data-id="concept_aw1_p1q_plb" data-state="leaf"><span role="button" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/Overview.html#concept_aw1_p1q_plb" id="concept_aw1_p1q_plb-d16893e60156-link">Solutions Overview </a></div></div></li><li role="treeitem" class="active"><div data-tocid="concept_jkm_rnz_kx-d16893e60178" class="topicref" data-id="concept_jkm_rnz_kx" data-state="leaf"><span role="button" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/Parquet.html#concept_jkm_rnz_kx" id="concept_jkm_rnz_kx-d16893e60178-link">Converting Data to the Parquet Data Format</a><div class="wh-tooltip"><p class="shortdesc"></p></div></div></div></li><li role="treeitem"><div data-tocid="concept_szz_xwm_lx-d16893e60202" class="topicref" data-id="concept_szz_xwm_lx" data-state="leaf"><span role="button" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/Impala.html#concept_szz_xwm_lx" id="concept_szz_xwm_lx-d16893e60202-link">Automating Impala Metadata Updates for Drift Synchronization for Hive</a></div></div></li><li role="treeitem"><div data-tocid="concept_d1q_xl4_lx-d16893e60224" class="topicref" data-id="concept_d1q_xl4_lx" data-state="leaf"><span role="button" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/FileManagement.html#concept_d1q_xl4_lx" id="concept_d1q_xl4_lx-d16893e60224-link">Managing Output Files</a><div class="wh-tooltip"><p class="shortdesc"></p></div></div></div></li><li role="treeitem"><div data-tocid="concept_kff_ykv_lz-d16893e60248" class="topicref" data-id="concept_kff_ykv_lz" data-state="leaf"><span role="button" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/StopPipeline.html#concept_kff_ykv_lz" id="concept_kff_ykv_lz-d16893e60248-link">Stopping a Pipeline After Processing All Available Data</a><div class="wh-tooltip"><p class="shortdesc"></p></div></div></div></li><li role="treeitem"><div data-tocid="concept_vrh_jrs_bbb-d16893e60272" class="topicref" data-id="concept_vrh_jrs_bbb" data-state="leaf"><span role="button" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/SqoopReplacement.html#concept_vrh_jrs_bbb" id="concept_vrh_jrs_bbb-d16893e60272-link">Offloading Data from Relational Sources to Hadoop</a></div></div></li><li role="treeitem"><div data-tocid="concept_t2t_lp5_xz-d16893e60294" class="topicref" data-id="concept_t2t_lp5_xz" data-state="leaf"><span role="button" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/SendEmail.html#concept_t2t_lp5_xz" id="concept_t2t_lp5_xz-d16893e60294-link">Sending Email During Pipeline Processing</a></div></div></li><li role="treeitem"><div data-tocid="concept_ocb_nnl_px-d16893e60316" class="topicref" data-id="concept_ocb_nnl_px" data-state="leaf"><span role="button" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/EventStorage.html#concept_ocb_nnl_px" id="concept_ocb_nnl_px-d16893e60316-link">Preserving an Audit Trail of Events</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_a5b_wvk_ckb-d16893e60338" class="topicref" data-id="concept_a5b_wvk_ckb" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_a5b_wvk_ckb-d16893e60338-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/DeltaLake.html#concept_a5b_wvk_ckb" id="concept_a5b_wvk_ckb-d16893e60338-link">Loading Data into Databricks Delta Lake</a><div class="wh-tooltip"><p class="shortdesc"></p></div></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_phk_bdf_2w-d16893e60616" class="topicref" data-id="concept_phk_bdf_2w" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_phk_bdf_2w-d16893e60616-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/HiveDrift-Overview.html#concept_phk_bdf_2w" id="concept_phk_bdf_2w-d16893e60616-link">Drift Synchronization Solution for Hive</a><div class="wh-tooltip"><p class="shortdesc"></p></div></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_ljq_knr_4cb-d16893e61090" class="topicref" data-id="concept_ljq_knr_4cb" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_ljq_knr_4cb-d16893e61090-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Solutions/JDBC_DriftSyncSolution.html#concept_ljq_knr_4cb" id="concept_ljq_knr_4cb-d16893e61090-link"><span class="ph">Drift Synchronization Solution for PostgreSQL</span></a></div></div></li></ul></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_ugp_kwf_xw-d16893e61337" class="topicref" data-id="concept_ugp_kwf_xw" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_ugp_kwf_xw-d16893e61337-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/DPM/DPM_title.html" id="concept_ugp_kwf_xw-d16893e61337-link">StreamSets Control Hub</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_fyf_gkq_4bb-d16893e62693" class="topicref" data-id="concept_fyf_gkq_4bb" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_fyf_gkq_4bb-d16893e62693-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Edge_Mode/EdgePipelines_title.html" id="concept_fyf_gkq_4bb-d16893e62693-link"><span class="ph">StreamSets Data Collector Edge</span></a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_wwq_gxc_py-d16893e63980" class="topicref" data-id="concept_wwq_gxc_py" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_wwq_gxc_py-d16893e63980-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Multithreaded_Pipelines/MultithreadedPipelines.html#concept_wwq_gxc_py" id="concept_wwq_gxc_py-d16893e63980-link">Multithreaded Pipelines</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_gzw_tdm_p2b-d16893e64187" class="topicref" data-id="concept_gzw_tdm_p2b" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_gzw_tdm_p2b-d16893e64187-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Microservice/Microservice_Title.html#concept_gzw_tdm_p2b" id="concept_gzw_tdm_p2b-d16893e64187-link">Microservice Pipelines</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="Orchestrators_Title-d16893e64348" class="topicref" data-id="Orchestrators_Title" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action Orchestrators_Title-d16893e64348-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Orchestration_Pipelines/OrchestrationPipelines_Title.html#Orchestrators_Title" id="Orchestrators_Title-d16893e64348-link">Orchestration Pipelines</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_wr1_ktz_bt-d16893e64489" class="topicref" data-id="concept_wr1_ktz_bt" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_wr1_ktz_bt-d16893e64489-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/RPC_Pipelines/SDC_RPCpipelines_title.html#concept_wr1_ktz_bt" id="concept_wr1_ktz_bt-d16893e64489-link">SDC RPC Pipelines</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_fpz_5r4_vs-d16893e64679" class="topicref" data-id="concept_fpz_5r4_vs" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_fpz_5r4_vs-d16893e64679-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Cluster_Mode/ClusterPipelines_title.html" id="concept_fpz_5r4_vs-d16893e64679-link">Cluster Pipelines</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_jjk_23z_sq-d16893e65172" class="topicref" data-id="concept_jjk_23z_sq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_jjk_23z_sq-d16893e65172-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Data_Preview/DataPreview_Title.html#concept_jjk_23z_sq" id="concept_jjk_23z_sq-d16893e65172-link">Data Preview</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_pgk_brx_rr-d16893e65458" class="topicref" data-id="concept_pgk_brx_rr" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_pgk_brx_rr-d16893e65458-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Alerts/RulesAlerts_title.html#concept_pgk_brx_rr" id="concept_pgk_brx_rr-d16893e65458-link">Rules and Alerts</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_asx_fdz_sq-d16893e65960" class="topicref" data-id="concept_asx_fdz_sq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_asx_fdz_sq-d16893e65960-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Pipeline_Monitoring/PipelineMonitoring_title.html#concept_asx_fdz_sq" id="concept_asx_fdz_sq-d16893e65960-link">Pipeline Monitoring</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_o3l_dtr_5q-d16893e66304" class="topicref" data-id="concept_o3l_dtr_5q" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_o3l_dtr_5q-d16893e66304-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Pipeline_Maintenance/PipelineMaintenance_title.html#concept_o3l_dtr_5q" id="concept_o3l_dtr_5q-d16893e66304-link">Pipeline Maintenance</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_yms_ftm_sq-d16893e66768" class="topicref" data-id="concept_yms_ftm_sq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_yms_ftm_sq-d16893e66768-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Administration/Administration_title.html#concept_yms_ftm_sq" id="concept_yms_ftm_sq-d16893e66768-link">Administration</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_nls_w1r_ks-d16893e67508" class="topicref" data-id="concept_nls_w1r_ks" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_nls_w1r_ks-d16893e67508-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Tutorial/Tutorial-title.html" id="concept_nls_w1r_ks-d16893e67508-link">Tutorial</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_sh3_frm_tq-d16893e68001" class="topicref" data-id="concept_sh3_frm_tq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_sh3_frm_tq-d16893e68001-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Troubleshooting/Troubleshooting_title.html#concept_sh3_frm_tq" id="concept_sh3_frm_tq-d16893e68001-link">Troubleshooting</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_xbx_rs1_tq-d16893e68798" class="topicref" data-id="concept_xbx_rs1_tq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_xbx_rs1_tq-d16893e68798-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Glossary/Glossary_title.html#concept_xbx_rs1_tq" id="concept_xbx_rs1_tq-d16893e68798-link">Glossary</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_jn1_nzb_kv-d16893e68843" class="topicref" data-id="concept_jn1_nzb_kv" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_jn1_nzb_kv-d16893e68843-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Apx-DataFormats/DataFormat_Title.html#concept_jn1_nzb_kv" id="concept_jn1_nzb_kv-d16893e68843-link">Data Formats by Stage</a><div class="wh-tooltip"><p class="shortdesc"></p></div></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_pvm_yt3_wq-d16893e68958" class="topicref" data-id="concept_pvm_yt3_wq" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_pvm_yt3_wq-d16893e68958-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Expression_Language/ExpressionLanguage_title.html" id="concept_pvm_yt3_wq-d16893e68958-link">Expression Language</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_vcj_1ws_js-d16893e69669" class="topicref" data-id="concept_vcj_1ws_js" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_vcj_1ws_js-d16893e69669-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Apx-RegEx/RegEx-Title.html#concept_vcj_1ws_js" id="concept_vcj_1ws_js-d16893e69669-link">Regular Expressions</a></div></div></li><li role="treeitem" aria-expanded="false"><div data-tocid="concept_chv_vmj_wr-d16893e69787" class="topicref" data-id="concept_chv_vmj_wr" data-state="not-ready"><span role="button" tabindex="0" aria-labelledby="button-expand-action concept_chv_vmj_wr-d16893e69787-link" class="wh-expand-btn"></span><div class="title"><a href="../../../datacollector/UserGuide/Apx-GrokPatterns/GrokPatterns_title.html#concept_chv_vmj_wr" id="concept_chv_vmj_wr-d16893e69787-link">Grok Patterns</a></div></div></li></ul></div>
                        

</div>
</nav>
                    


                    
                    <div id="wh_topic_body" class="col-lg-7 col-md-9 col-sm-12">
<button id="wh_close_publication_toc_button" class="close-toc-button d-none" aria-label="Toggle publishing table of content" aria-controls="wh_publication_toc" aria-expanded="true"><span class="close-toc-icon-container"><span class="close-toc-icon"></span></span></button><button id="wh_close_topic_toc_button" class="close-toc-button d-none" aria-label="Toggle topic table of content" aria-controls="wh_topic_toc" aria-expanded="true"><span class="close-toc-icon-container"><span class="close-toc-icon"></span></span></button>

                        
<div class=" wh_topic_content body "><main role="main"><article class="" role="article" aria-labelledby="ariaid-title1"><article class="nested0" aria-labelledby="ariaid-title1" id="concept_jkm_rnz_kx">
    <h1 class="- topic/title title topictitle1" id="ariaid-title1">Converting Data to the Parquet Data Format</h1>
    
    <div class="- topic/body concept/conbody body conbody"><p class="- topic/shortdesc shortdesc"></p>
        <p class="- topic/p p">This solution describes how to convert Avro files to the columnar format, Parquet.</p>
        <p class="- topic/p p">Let's say that you want to store data on HDFS using the columnar format, Parquet. But <span class="- topic/ph ph">Data Collector</span>
            doesn't have a Parquet data format. How do you do it? </p>
        <p class="- topic/p p">The <a class="- topic/xref xref" href="../Event_Handling/EventFramework-Title.html#concept_cph_5h4_lx">event
                framework</a> was created for exactly this purpose. The simple addition of an
            event stream to the pipeline enables the automatic conversion of Avro files to Parquet. </p>
        <p class="- topic/p p">You can use the Spark executor to trigger a Spark application or the MapReduce executor
            to trigger a MapReduce job. This solution uses the MapReduce executor.</p>
        <p class="- topic/p p">It's just a few simple steps:</p>
        <div class="- topic/p p">
            <ol class="- topic/ol ol" id="concept_jkm_rnz_kx__ol_oh1_4gm_lx" data-ofbid="concept_jkm_rnz_kx__ol_oh1_4gm_lx">
                <li class="- topic/li li">Create the pipeline you want to use.<p class="- topic/p p">Use the origin and processors that you
                        need, like any other pipeline. And then, configure Hadoop FS to write Avro
                        data to HDFS.</p><p class="- topic/p p">The Hadoop FS destination generates events each time it
                        closes a file. This is perfect because we want to convert files to Parquet
                        only after they are fully written. </p><div class="- topic/p p">
                        <div class="- topic/note note note note_note"><span class="note__title">Note:</span> To avoid running unnecessary numbers of MapReduce jobs, configure the
                            destination to create files as large as the destination system can
                            comfortably handle.</div>
                    </div><p class="- topic/p p"><img class="- topic/image image" id="concept_jkm_rnz_kx__image_x2b_yvm_lx" src="../Graphics/Event-ParquetBasicPipe.png" height="87" width="622"/></p></li>
                <li class="- topic/li li">Configure Hadoop FS to generate events. <p class="- topic/p p">On the <span class="+ topic/keyword ui-d/wintitle keyword wintitle">General</span>
                        tab of the destination, select the <span class="+ topic/ph ui-d/uicontrol ph uicontrol">Produce Events</span>
                        property.</p><p class="- topic/p p">With this property selected, the event output stream becomes
                        available and Hadoop FS generates an event record each time it closes an
                        output file. The event record includes the file path for the closed file.
                            </p><p class="- topic/p p"><img class="- topic/image image" id="concept_jkm_rnz_kx__image_ibh_mvm_lx" src="../Graphics/Event-ParquetHDFS.png" height="302" width="700"/></p></li>
                <li class="- topic/li li">Connect the Hadoop FS event output stream to a MapReduce executor. <p class="- topic/p p">Now, each
                        time the MapReduce executor receives an event, it triggers the jobs that you
                        configure it to run.</p><p class="- topic/p p"><img class="- topic/image image" id="concept_jkm_rnz_kx__image_wjn_k5m_lx" src="../Graphics/Event-ParquetPipe.png" height="176" width="812"/></p></li>
                <li class="- topic/li li">Configure the MapReduce executor to run a job that converts the completed Avro
                    file to Parquet.<p class="- topic/p p">In the MapReduce executor, configure the MapReduce
                        configuration details and select the Avro to Parquet job type. On the
                            <span class="+ topic/keyword ui-d/wintitle keyword wintitle">Avro Conversion</span> tab, configure the file input and
                        output directory information and related properties: </p><p class="- topic/p p"><img class="- topic/image image" id="concept_jkm_rnz_kx__image_rzm_frb_tx" src="../Graphics/Event-Parquet-MapReduce.png" height="308" width="516"/></p><p class="- topic/p p">The <span class="+ topic/ph ui-d/uicontrol ph uicontrol">Input Avro File</span> property
                        default, <code class="+ topic/ph pr-d/codeph ph codeph">${record:value('/filepath')}</code>, runs the job on the
                        file specified in the Hadoop FS file closure event record. </p><p class="- topic/p p">Then, on
                        the <span class="+ topic/keyword ui-d/wintitle keyword wintitle">Avro to Parquet</span> tab, optionally configure advanced
                        Parquet properties.</p></li>
            </ol>
        </div>
        <p class="- topic/p p">With this event stream added to the pipeline, each time the Hadoop FS destination closes
            a file, it generates an event. When the MapReduce executor receives the event, it kicks
            off a MapReduce job that converts the Avro file to Parquet. Simple!</p>
    </div>
</article></article></main></div>

                        
                        
                        


                    </div>
                    
                </div>
            </div>


        </div> <nav class="navbar navbar-default wh_footer" data-whc_version="25.0">
  <div class=" footer-container  mx-auto">
    <!-- script for Data Collector, all flavors, but only used when accessed directly, not from portal --><script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-60917135-3', 'auto');
  ga('send', 'pageview');
</script>
  </div>
</nav>

        
        <div id="go2top">
            <span class="oxy-icon oxy-icon-up"></span>
        </div>
        
        <!-- The modal container for images -->
        <div id="modal_img_large" class="modal">
            <span class="close oxy-icon oxy-icon-remove"></span>
            <!-- Modal Content (The Image) -->
            <div id="modal_img_container"></div>
            <!-- Modal Caption (Image Text) -->
            <div id="caption"></div>
        </div>
        
        
        Â© 2023 StreamSets, Inc.

    </body>
</html>